Who can resist a good rhyme? Or a bad one?
So this round of dak hacking turned out to make the
AJ Market
scheme another notch more confusing – hence the delay in blogging, and
the teaser in my
last post. The issue leading to the confusion
is that the major item on the list to start hacking on was SCC, which
unlike the projects I’ve undertaken so far, is more than a day’s hacking.
In fact, due to the need to give mirrors a chance to adapt to the new
system between it being implemented and actually used, it’s actually
more of a multi-week task. And doing it as a one-day-a-week project would
extend that into a multi-month task afaics. I guess that’s still better
than never, but obviously it’s worth looking into alternatives.
Naturally, then, the first phase of a longer project like this is
threefold. We’ll call it “the three P’s”.
Planning
My theory at this point was to come up with a plan for what to do,
try figuring out how much work it’d take, and then see what sort of
financial arrangements might be plausible – not involving me cutting a few
weeks out of my life for spare change, but without making the whole thing
an unleapable chasm from what the AJ Market’s currently managing either.
I figured writing it up as a semi-formal proposal makes most sense:
Summary
Implementation of the outstanding mirror split proposal for the Debian
archive to allow new architectures, particularly amd64, to be included
in the archive.
Benefit
In spite (or perhaps because) of its simplicity, this project has been
languishing for over two years, and is not currently being worked on;
so at present it’s not even possible to estimate when it would otherwise
be completed.
It is most notably preventing amd64 from being integrated into the
normal Debian development environment, causing derived distributions to
maintain amd64 specific patches themselves.
In the longer term, reducing the constraints imposed on the archive
size may allow the introduction of additional suites, such as backports
or volatile, as well as additional architectures; though significant
further discussion on this would be needed.
Background
Since at least mid-2003 the Debian archive has been closed to new
architectures due to the already large amount of space and bandwidth
required to become a Debian mirror. At present, the archive uses some
158GiB of disk, and about 1GiB per day; additional architectures are
expected to require approximately an additional 10GiB each, and there
are likely around half a dozen architectures that will be considered
for addition once the moratorium on new architectures is rescinded (incl
amd64, armeb, sh variants, kfreebsd and possibly partial architectures
for arch variants such as s390x and ppc64).
The primary work needed to fix this involves:
- ensuring the mirror network operates correctly when a majority of
mirrors are partial; this reduces the impact on bandwidth and storage
capacity
- optimising portions of the archive maintenance software, particularly
apt-ftparchive; this reduces the load on the archive server
- providing appropriate guidelines on the qualification criteria new
architectures need to meet in order to be added to the archive; this
provides a limit on future increases, allowing growth to be appropriately
controlled
Actual work
I expect there will be six phases to the project:
- cleanup of the archive as it stands, and establishing a clear
categorisation of its contents to define what a partial mirror by
architecture or suite should officially contain
- providing appropriate scripts to ensure mirror sites can easily
comply with the previously defined expectations for partial mirroring
- devise an appropriate structure for the new mirror network, that
can easily incorporate existing mirrors, and coexist with the existing
structure
- provide information on the new structure to both mirror admins and
users; assist with the transition, and resolve any problems found
- ensure the archive management software is appropriately optimised,
and that archive inclusion criteria have been debated and established
- add new ports that have passed the qualification requirements to
the archive
In theory, a couple of days for each of those sound plausible, so
making that twelve days actual work (with a couple of week’s delay in
between for mirrors to have some time to adapt to the new network). On
the downside, twelve days at a day a week is over three months of real
time, not counting the possibility of doing other things with the one
day a week, or Christmas, or the aforementioned delay for mirrors. Yick.
So much for planning.
Preparation
So the next “p” is preparation. In this case that’s finally getting
around to fix dak CVS, which has been slightly broken since May. The
extent of the actual breakage was just the loss of the ChangeLog history,
aiui (or at least, that was the unrecovered breakage), but the result of
that was months of uncommitted changes on both ftp-master and security
(and reportedly from Ubuntu’s dak installation too). The changelog for
the first set of commits (not counting buildd changes from ftp-master,
security changes or Ubuntu changes that haven’t made it to ftp-master)
looks like:
* tiffani: new script to do patches to Packages, Sources and Contents
files for quicker downloads.
* ziyi: update to authenticate tiffani generated files
* dak: new script to provide a single binary with less arbitrary names
for access to dak functionality.
* cindy: script implemented
* saffron: cope with suites that don't have a Priority specified
* heidi: use get_suite_id()
* denise: don't hardcode stable and unstable, or limit udebs to unstable
* denise: remove override munging for testing (now done by cindy)
* helena: expanded help, added new, sort and age options, and fancy headers
* jennifer: require description, add a reject for missing dsc file
* jennifer: change lock file
* kelly: propogation support
* lisa: honour accepted lock, use mtime not ctime, add override type_id
* madison: don't say "dep-retry"
* melanie: bug fix in output (missing %)
* natalie: cope with maintainer_override == None; add type_id for overrides
* nina: use mtime, not ctime
* katie.py: propogation bug fixes
* logging.py: add debugging support, use as the logfile separator
* katie.conf: updated signing key (4F368D5D)
* katie.conf: changed lockfile to dinstall.lock
* katie.conf: added Lisa::AcceptedLockFile, Dir::Lock
* katie.conf: added tiffani, cindy support
* katie.conf: updated to match 3.0r6 release
* katie.conf: updated to match sarge's release
* apt.conf: update for sarge's release
* apt.conf.stable: update for sarge's release
* apt.conf: bump daily max Contents change to 25MB from 12MB
* cron.daily: add accepted lock and invoke cindy
* cron.daily: add daily.lock
* cron.daily: invoke tiffani
* cron.daily: rebuild accepted buildd stuff
* cron.daily: save rene-daily output on the web site
* cron.daily: disable billie
* cron.daily: add stats pr0n
* cron.hourly: invoke helena
* pseudo-packages.maintainers,.descriptions: miscellaneous updates
* vars: add lockdir, add etch to copyoverrides
* Makefile: add -Ipostgresql/server to CXXFLAGS
* docs/: added README.quotes
* docs/: added manpages for alicia, catherine, charisma, cindy, heidi,
julia, katie, kelly, lisa, madison, melanie, natalie, rhona.
* TODO: correct spelling of "conflicts"
Ugh. Still, that’s enough to start work.
And the final “P”? Come on, be honest with yourself, you know what it’s
going to be.
Procrastination
Okay, that’s not entirely fair; the irrelevant bit of work was
actually on to TODO list before SCC (mostly because it was something
that I could get done reasonably quickly) and in fact was this line of
the above changelog:
* dak: new script to provide a single binary with less arbitrary names
for access to dak functionality.
All the various model/actress names have been getting more than
a little confusing recently, with almost forty in the dak suite, and
another two dozen or so in use elsewhere – and then there’s the fact
that the whole hot babes thing is both a
bit offensive, and
getting a bit old. OTOH, you need something to rename them to.
We ended up deciding on the “version control solution” and introducing a
“dak” command that’d launch all the different little bits of functionality
depending on arguments, in the same way cvs, svn, tla, bzr, darcs etc do.
The implementation’s kinda neat: we have a list of commands (like
“ls”) and their description (“Show which suites packages are in”),
along with the python module and function they’re in (“madison”,
“main()”). That let’s us not actually have to change any of the other
scripts immediately, and lets “dak ls foo” work the same as “madison
foo”. It also means that down the track we don’t need to have separate
modules for each subcommand, and that we can rename modules and functions
without affecting the user interface.
Of course it also means that all the internal scripts haven’t changed
to use the new names yet, leaving the new interface a bit underused,
but hey, “dak ls” at least manages to be one character shorter than
“madison”, so that’s a win!
To be continued…